Packages
Background and Goals
This CEWL (cutaneous evaporative water loss) data was measured in 3-5 technical replicates on the mid-dorsum of Blunt-nosed Leopard Lizards (Gambelia sila) between April - July 2021. In this R script, I check the distribution of replicates, omit outliers, and average remaining replicates. The final values will be more precise and accurate estimates of the true CEWL for each lizard, and those values will be used in the analyses R script file. Please refer to doi: for the published scientific paper and full details.
Load Data
- Compile a list of the filenames I need to read-in.
- Make a function that will read in the data from each csv, name and organize the data correctly.
read_CEWL_file <- function(filename) {
dat <- read.csv(file.path("data/CEWL", filename),
na.strings=c("","NA"),
# each csv has headers
header = TRUE
) %>%
# select only the relevant values
dplyr::select(date = Date,
time = Time,
status = Status,
ID_rep_no = Comments,
CEWL_g_m2h = 'TEWL..g..m2h..',
msmt_temp_C = 'AmbT..C.',
msmt_RH_percent = 'AmbRH....'
)
# return the dataframe for that single csv file
dat
}- Apply the function I made to all of the filenames I compiled, then put all of those dataframes into one dataframe. This will print warnings saying that header and col.names are different lengths, because the data has extra notes cols that we read-in, but get rid of. Additionally, filter out failed measurements and properly format data classes.
# apply function to get data from all csvs
all_CEWL_data <- lapply(filenames, read_CEWL_file) %>%
# paste all data files together into one df by row
reduce(rbind) %>%
# extract individual_ID and replicate number
dplyr::mutate(ID_rep_no = as.character(ID_rep_no),
ID_len = as.factor(nchar(ID_rep_no)),
individual_ID = as.factor(case_when(
ID_len == 7 ~ as.character(paste(substr(ID_rep_no, 1, 1),
substr(ID_rep_no, 3, 5),
sep = "-")),
ID_len == 6 & substr(ID_rep_no, 1, 1) == "W"
~ as.character(paste(substr(ID_rep_no, 1, 1),
substr(ID_rep_no, 2, 4),
sep = "-")),
ID_len == 6 & substr(ID_rep_no, 1, 1) %in% c("M", "F")
~ as.character(paste(substr(ID_rep_no, 1, 1),
substr(ID_rep_no, 3, 4),
sep = "-")),
ID_len == 5 ~ as.character(paste(substr(ID_rep_no, 1, 1),
substr(ID_rep_no, 2, 3),
sep = "-")))
),
replicate_no = as.factor(case_when(
ID_len == 7 ~ as.character(substr(ID_rep_no, 7, 7)),
ID_len == 6 ~ as.character(substr(ID_rep_no, 6, 6)),
ID_len == 5 ~ as.character(substr(ID_rep_no, 5, 5))
))) %>%
# filter out failed measurements
dplyr::filter(status == "Normal") %>%
# correctly format data classes
mutate(date = as.Date(date, format = "%m/%d/%y"),
time = as.POSIXct(time, format = "%H:%M"),
status = as.factor(status)
)
summary(all_CEWL_data)## date time status
## Min. :2021-04-23 Min. :2023-11-09 01:00:00.00 Normal:456
## 1st Qu.:2021-04-24 1st Qu.:2023-11-09 02:24:45.00
## Median :2021-05-07 Median :2023-11-09 03:46:00.00
## Mean :2021-05-12 Mean :2023-11-09 04:22:43.42
## 3rd Qu.:2021-05-08 3rd Qu.:2023-11-09 05:02:15.00
## Max. :2021-07-14 Max. :2023-11-09 12:59:00.00
##
## ID_rep_no CEWL_g_m2h msmt_temp_C msmt_RH_percent ID_len
## Length:456 Min. :-1.32 Min. :18.90 Min. :11.50 5:122
## Class :character 1st Qu.: 7.74 1st Qu.:28.50 1st Qu.:14.50 6:244
## Mode :character Median :10.21 Median :30.30 Median :16.95 7: 90
## Mean :10.62 Mean :29.55 Mean :21.36
## 3rd Qu.:12.89 3rd Qu.:31.50 3rd Qu.:24.30
## Max. :65.31 Max. :33.70 Max. :41.60
##
## individual_ID replicate_no
## F-12 : 13 1:117
## M-10 : 13 2:118
## M-11 : 13 3:118
## M-19 : 13 4: 52
## M-20 : 13 5: 51
## M-09 : 12
## (Other):379
## [1] F-01 F-10 F-11 F-12 F-13 F-14 F-15 F-16 F-17 F-18 F-19 F-02
## [13] F-03 F-04 F-05 F-06 F-07 F-08 F-09 M-01 M-10 M-11 M-12 M-13
## [25] M-14 M-15 M-16 M-17 M-18 M-19 M-02 M-20 M-03 M-04 M-05 M-06
## [37] M-07 M-08 M-09 W-010 W-011 W-012 W-013 W-014 W-015 W-016 W-017 W-018
## [49] W-019 W-002 W-020 W-021 W-022 W-023 W-024 W-025 W-026 W-027 W-028 W-003
## [61] W-004 W-005 W-006 W-007 W-008 W-009 W-029 W-030 W-031 W-001 W-032 F-08A
## [73] W-034 W-035 W-033 W-036 W-037 W-038 M-03A W-039
## 80 Levels: F-01 F-02 F-03 F-04 F-05 F-06 F-07 F-08 F-08A F-09 F-10 ... W-039
Check Data
Each lizard measured on each date should have 3-5 technical replicates, and those measurements should have been taken around the same time.
all_CEWL_data %>%
group_by(individual_ID, date) %>%
summarise(n = n(),
time_range = max(time) - min(time)) %>%
arrange(n)## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## # A tibble: 118 × 4
## # Groups: individual_ID [80]
## individual_ID date n time_range
## <fct> <date> <int> <drtn>
## 1 F-01 2021-04-23 3 120 secs
## 2 F-02 2021-04-23 3 120 secs
## 3 F-03 2021-04-23 3 120 secs
## 4 F-04 2021-04-23 3 60 secs
## 5 F-05 2021-04-24 3 120 secs
## 6 F-06 2021-04-24 3 120 secs
## 7 F-07 2021-04-24 3 60 secs
## 8 F-08 2021-04-24 3 60 secs
## 9 F-09 2021-04-24 3 120 secs
## 10 F-10 2021-04-24 3 120 secs
## # … with 108 more rows
The number of measurements taken is good! Almost always 3 or 5, with two lizards who only got 4 measurements, which is fine. But, M01 on April 23 and M03a on July 14 have abnormal time ranges of 43140 seconds (almost 12h), so we need to check that data.
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-23 2023-11-09 12:57:00 Normal M01_1 0.69 31.0
## 2 2021-04-23 2023-11-09 12:59:00 Normal M01_2 0.14 30.7
## 3 2021-04-23 2023-11-09 01:00:00 Normal M01_3 1.12 30.5
## 4 2021-07-14 2023-11-09 12:58:00 Normal M-03A-1 9.98 27.4
## 5 2021-07-14 2023-11-09 12:59:00 Normal M-03A-2 9.16 27.8
## 6 2021-07-14 2023-11-09 01:00:00 Normal M-03A-3 11.05 28.0
## 7 2021-07-14 2023-11-09 01:01:00 Normal M-03A-4 13.29 28.1
## 8 2021-07-14 2023-11-09 01:02:00 Normal M-03A-5 8.69 28.4
## 9 2021-07-14 2023-11-09 05:00:00 Normal M-01-1 13.70 27.4
## 10 2021-07-14 2023-11-09 05:01:00 Normal M-01-2 10.94 27.2
## 11 2021-07-14 2023-11-09 05:02:00 Normal M-01-3 11.35 27.0
## 12 2021-07-14 2023-11-09 05:03:00 Normal M-01-4 9.39 26.8
## 13 2021-07-14 2023-11-09 05:04:00 Normal M-01-5 8.90 26.6
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 15.9 5 M-01 1
## 2 16.3 5 M-01 2
## 3 16.7 5 M-01 3
## 4 37.1 7 M-03A 1
## 5 36.8 7 M-03A 2
## 6 37.1 7 M-03A 3
## 7 35.9 7 M-03A 4
## 8 35.2 7 M-03A 5
## 9 39.7 6 M-01 1
## 10 39.6 6 M-01 2
## 11 39.5 6 M-01 3
## 12 39.6 6 M-01 4
## 13 39.6 6 M-01 5
Aha, it seems the problem is that the time isn’t perfectly formatted, so 1 pm is coded as 1 am –> the measurements in question went across hours of 12 noon to 1 pm, so when reformatted, it seems like 1 am to 12 pm. It’s fine as-is, and nothing is amiss with the data.
Replicates
Assess Variation
We want the Coefficient of Variation (CV) among our technical replicates to be small. We need to calculate it to identify whether there may be outliers.
CVs <- all_CEWL_data %>%
group_by(individual_ID, date) %>%
summarise(mean = mean(CEWL_g_m2h),
SD = sd(CEWL_g_m2h),
CV = (SD/mean) *100,
min = min(CEWL_g_m2h),
max = max(CEWL_g_m2h),
range = max - min
)## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## individual_ID date mean SD
## F-12 : 3 Min. :2021-04-23 Min. : 0.650 Min. : 0.1124
## M-09 : 3 1st Qu.:2021-04-24 1st Qu.: 8.486 1st Qu.: 1.4849
## M-10 : 3 Median :2021-04-24 Median :10.443 Median : 2.0290
## M-11 : 3 Mean :2021-05-08 Mean :10.823 Mean : 2.9641
## M-19 : 3 3rd Qu.:2021-05-08 3rd Qu.:13.391 3rd Qu.: 3.1195
## M-20 : 3 Max. :2021-07-14 Max. :31.550 Max. :29.3242
## (Other):100
## CV min max range
## Min. : 1.956 Min. :-1.320 Min. : 1.12 Min. : 0.220
## 1st Qu.: 15.021 1st Qu.: 6.723 1st Qu.:10.21 1st Qu.: 3.130
## Median : 20.135 Median : 8.245 Median :13.32 Median : 4.600
## Mean : 28.713 Mean : 8.159 Mean :14.36 Mean : 6.196
## 3rd Qu.: 35.639 3rd Qu.:10.500 3rd Qu.:16.37 3rd Qu.: 6.772
## Max. :105.713 Max. :19.640 Max. :65.31 Max. :52.900
##
We expect CV for technical replicates to be < 10-15%, so we must determine whether the CVs > 15% are due to outlier replicates. The range should also generally be within 5 units for these measurements. :(
Find Outliers
First, create a function to look at the replicates for each individual on each day. For each iteration, I will make a boxplot and extract any outliers, compiling a dataframe of outliers that I want to exclude from the final dataset. By printing the boxplots and compiling a dataframe of outliers, I can check the data against the plots to ensure confidence in the outliers quantified.
# write function to find outliers for each individual on each date
find_outliers <- function(df) {
# initiate dataframe to compile info and list to compile plots
outliers <- data.frame()
#boxplots <- list()
# initiate a for loop to go through every who in df
for(indiv_ch in unique(df$individual_ID)) {
# select data for only the individual of interest
df_sub <- df %>%
dplyr::filter(individual_ID == (indiv_ch))
# make a boxplot
df_sub %>%
ggplot(.) +
geom_boxplot(aes(x = as.factor(date),
y = CEWL_g_m2h,
fill = as.factor(date))) +
ggtitle(paste("Individual", indiv_ch)) +
theme_classic() -> plot
# print/save
print(plot)
#boxplots[[indiv_ch]] <- plot
# extract outliers
outs <- df_sub %>%
group_by(individual_ID, date) %>%
summarise(outs = boxplot.stats(CEWL_g_m2h)$out)
# add to running dataframe of outliers
outliers <- outliers %>%
rbind(outs)
}
#return(boxplots)
return(outliers)
}Now apply the function to the data:
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID', 'date'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## # A tibble: 24 × 3
## # Groups: individual_ID, date [18]
## individual_ID date outs
## <fct> <date> <dbl>
## 1 F-13 2021-05-08 41.9
## 2 F-06 2021-05-08 13.8
## 3 M-10 2021-05-07 7.17
## 4 M-10 2021-05-07 2.79
## 5 M-11 2021-05-07 13.5
## 6 M-11 2021-07-14 11.5
## 7 M-13 2021-07-14 17.1
## 8 M-13 2021-07-14 11.3
## 9 M-19 2021-07-14 17.9
## 10 M-20 2021-07-14 11.3
## # … with 14 more rows
Based on the plots, the dataframe of outliers I compiled is correct. (yay!)
Remove Outliers
Now I will create a secondary version of the same function, but instead of compiling outliers, I will omit them from the dataset.
# write function to find and exclude outliers
omit_outliers <- function(df) {
# initiate dataframe to compile info and list to compile plots
cleaned <- data.frame()
# initiate a for loop to go through every who in df
for(indiv_ch in unique(df$individual_ID)) {
# select data for only the individual of interest
df_sub <- df %>%
dplyr::filter(individual_ID == (indiv_ch))
# extract outliers
outs <- df_sub %>%
group_by(individual_ID, date) %>%
summarise(outs = boxplot.stats(CEWL_g_m2h)$out)
# filter outliers from data subset for this individual
filtered <- df_sub %>%
dplyr::filter(CEWL_g_m2h %nin% outs$outs)
# add to running dataframe of cleaned data
cleaned <- cleaned %>%
rbind(filtered)
}
return(cleaned)
}Apply function to data and check that the new data subsets still contain the right amount of data:
outliers_omitted <- omit_outliers(all_CEWL_data)
nrow(all_CEWL_data) == nrow(outliers_omitted) + nrow(outliers_found)## [1] TRUE
Re-Assess Variation
new_CVs <- outliers_omitted %>%
group_by(individual_ID, date) %>%
summarise(mean = mean(CEWL_g_m2h),
SD = sd(CEWL_g_m2h),
CV = (SD/mean) *100,
min = min(CEWL_g_m2h),
max = max(CEWL_g_m2h),
range = max - min)## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## individual_ID date mean SD
## F-12 : 3 Min. :2021-04-23 Min. : 0.650 Min. : 0.05508
## M-09 : 3 1st Qu.:2021-04-24 1st Qu.: 8.486 1st Qu.: 1.21719
## M-10 : 3 Median :2021-04-24 Median :10.421 Median : 1.85776
## M-11 : 3 Mean :2021-05-08 Mean :10.682 Mean : 2.65196
## M-19 : 3 3rd Qu.:2021-05-08 3rd Qu.:13.239 3rd Qu.: 2.88268
## M-20 : 3 Max. :2021-07-14 Max. :31.550 Max. :29.32424
## (Other):100
## CV min max range
## Min. : 1.032 Min. :-1.320 Min. : 1.120 Min. : 0.100
## 1st Qu.: 13.433 1st Qu.: 6.723 1st Qu.: 9.985 1st Qu.: 2.518
## Median : 19.265 Median : 8.420 Median :12.545 Median : 4.060
## Mean : 25.543 Mean : 8.287 Mean :13.639 Mean : 5.352
## 3rd Qu.: 33.436 3rd Qu.:10.502 3rd Qu.:15.520 3rd Qu.: 6.197
## Max. :105.713 Max. :19.640 Max. :65.310 Max. :52.900
##
This definitely improved things, but unfortunately, CVs are still skewed to the right. I think the replicate groups with only 3 replicates are harder to find outliers in.
Check the info for lizards with super high value ranges:
## # A tibble: 14 × 8
## # Groups: individual_ID [14]
## individual_ID date mean SD CV min max range
## <fct> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 F-02 2021-04-23 19.2 6.49 33.8 15.2 26.7 11.4
## 2 F-05 2021-04-24 31.6 29.3 92.9 12.4 65.3 52.9
## 3 F-06 2021-04-24 18.7 8.92 47.6 12.3 28.9 16.6
## 4 F-11 2021-05-08 14.4 5.34 37.1 7.76 20.3 12.6
## 5 F-14 2021-04-24 15.5 9.35 60.3 9.3 26.2 17.0
## 6 F-17 2021-05-08 13.4 4.56 34.0 8.78 19.0 10.2
## 7 M-10 2021-04-24 24.2 5.92 24.4 19.2 30.8 11.5
## 8 W-013 2021-04-24 16.6 5.91 35.7 12.6 23.4 10.7
## 9 W-016 2021-04-24 13.9 5.35 38.5 9.48 19.8 10.3
## 10 W-017 2021-04-24 16.7 6.62 39.6 12.0 24.3 12.3
## 11 W-024 2021-04-24 16.7 5.34 31.9 11.4 22.0 10.7
## 12 W-026 2021-04-25 22.3 17.7 79.5 10.0 42.6 32.5
## 13 W-031 2021-05-07 4.56 4.49 98.4 -1.32 10.4 11.8
## 14 W-037 2021-05-08 14.7 4.67 31.8 7.48 19.8 12.3
Look at the original CEWL measurements for those lizards:
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-23 2023-11-09 01:52:00 Normal F02_1 26.69 33.1
## 2 2021-04-23 2023-11-09 01:53:00 Normal F02_2 15.25 33.7
## 3 2021-04-23 2023-11-09 01:54:00 Normal F02_3 15.65 33.4
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 16.9 5 F-02 1
## 2 17.1 5 F-02 2
## 3 16.4 5 F-02 3
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-24 2023-11-09 12:19:00 Normal F05_2 65.31 31.0
## 2 2021-04-24 2023-11-09 12:20:00 Normal F05_3 16.93 29.1
## 3 2021-04-24 2023-11-09 12:21:00 Normal F05_4 12.41 29.2
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 21.0 5 F-05 2
## 2 20.6 5 F-05 3
## 3 20.3 5 F-05 4
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-24 2023-11-09 12:51:00 Normal F06_1 28.92 30.6
## 2 2021-04-24 2023-11-09 12:52:00 Normal F06_2 14.97 30.4
## 3 2021-04-24 2023-11-09 12:53:00 Normal F06_3 12.31 30.1
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 19.6 5 F-06 1
## 2 18.7 5 F-06 2
## 3 18.8 5 F-06 3
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-05-08 2023-11-09 02:11:00 Normal F-11_1 19.35 31.2
## 2 2021-05-08 2023-11-09 02:12:00 Normal F-11_2 20.32 31.5
## 3 2021-05-08 2023-11-09 02:13:00 Normal F-11_3 12.95 32.3
## 4 2021-05-08 2023-11-09 02:13:00 Normal F-11_4 11.51 33.1
## 5 2021-05-08 2023-11-09 02:14:00 Normal F-11_5 7.76 32.1
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 13.7 6 F-11 1
## 2 15.7 6 F-11 2
## 3 16.6 6 F-11 3
## 4 17.6 6 F-11 4
## 5 13.8 6 F-11 5
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-24 2023-11-09 12:02:00 Normal F14_1 26.25 26.5
## 2 2021-04-24 2023-11-09 12:03:00 Normal F14_2 9.30 28.1
## 3 2021-04-24 2023-11-09 12:05:00 Normal F14_3 10.95 28.3
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 23.0 5 F-14 1
## 2 24.1 5 F-14 2
## 3 22.1 5 F-14 3
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-05-08 2023-11-09 01:53:00 Normal F-17_1 18.96 31.1
## 2 2021-05-08 2023-11-09 01:53:00 Normal F-17_2 17.65 31.1
## 3 2021-05-08 2023-11-09 01:54:00 Normal F-17_3 10.69 31.2
## 4 2021-05-08 2023-11-09 01:54:00 Normal F-17_4 8.78 30.7
## 5 2021-05-08 2023-11-09 01:55:00 Normal F-17_5 11.04 30.2
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 15.2 6 F-17 1
## 2 14.3 6 F-17 2
## 3 14.1 6 F-17 3
## 4 14.1 6 F-17 4
## 5 14.5 6 F-17 5
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-24 2023-11-09 12:32:00 Normal M10_1 30.79 29.1
## 2 2021-04-24 2023-11-09 12:33:00 Normal M10_2 19.25 28.7
## 3 2021-04-24 2023-11-09 12:34:00 Normal M10_3 22.71 29.2
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 19.8 5 M-10 1
## 2 20.3 5 M-10 2
## 3 20.3 5 M-10 3
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-24 2023-11-09 03:58:00 Normal W013_1 23.36 31.5
## 2 2021-04-24 2023-11-09 03:58:00 Normal W013_2 13.69 31.4
## 3 2021-04-24 2023-11-09 03:59:00 Normal W013_3 12.63 31.1
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 15.6 6 W-013 1
## 2 15.2 6 W-013 2
## 3 15.1 6 W-013 3
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-24 2023-11-09 04:27:00 Normal W016_1 19.83 28.6
## 2 2021-04-24 2023-11-09 04:28:00 Normal W016_2 12.33 28.8
## 3 2021-04-24 2023-11-09 04:29:00 Normal W016_3 9.48 28.9
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 15.7 6 W-016 1
## 2 15.2 6 W-016 2
## 3 15.3 6 W-016 3
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-24 2023-11-09 03:29:00 Normal W017_1 24.31 30.9
## 2 2021-04-24 2023-11-09 03:30:00 Normal W017_2 13.88 30.7
## 3 2021-04-24 2023-11-09 03:31:00 Normal W017_3 12.02 30.6
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 17.5 6 W-017 1
## 2 15.6 6 W-017 2
## 3 15.3 6 W-017 3
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-24 2023-11-09 02:56:00 Normal W024_1 22.02 31.1
## 2 2021-04-24 2023-11-09 02:57:00 Normal W024_2 11.35 31.6
## 3 2021-04-24 2023-11-09 02:58:00 Normal W024_3 16.78 31.8
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 20.1 6 W-024 1
## 2 18.8 6 W-024 2
## 3 18.6 6 W-024 3
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-04-25 2023-11-09 02:36:00 Normal W026_1 42.56 19.1
## 2 2021-04-25 2023-11-09 02:37:00 Normal W026_2 14.20 19.1
## 3 2021-04-25 2023-11-09 02:37:00 Normal W026_3 10.04 18.9
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 36.5 6 W-026 1
## 2 35.6 6 W-026 2
## 3 36.0 6 W-026 3
outliers_omitted %>%
dplyr::filter(individual_ID == "W-031" & date == "2021-05-07") # def need to remove negative value, yikes## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-05-07 2023-11-09 04:54:00 Normal W-031_1 5.49 30.1
## 2 2021-05-07 2023-11-09 04:55:00 Normal W-031_2 1.84 30.3
## 3 2021-05-07 2023-11-09 04:56:00 Normal W-031_3 -1.32 29.7
## 4 2021-05-07 2023-11-09 04:57:00 Normal W-031_4 10.43 29.7
## 5 2021-05-07 2023-11-09 04:58:00 Normal W-031_5 6.35 29.8
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 23.3 7 W-031 1
## 2 23.0 7 W-031 2
## 3 23.4 7 W-031 3
## 4 23.7 7 W-031 4
## 5 23.6 7 W-031 5
## date time status ID_rep_no CEWL_g_m2h msmt_temp_C
## 1 2021-05-08 2023-11-09 04:20:00 Normal W-037_1 19.75 31.4
## 2 2021-05-08 2023-11-09 04:21:00 Normal W-037_2 17.11 32.0
## 3 2021-05-08 2023-11-09 04:22:00 Normal W-037_3 15.94 31.6
## 4 2021-05-08 2023-11-09 04:23:00 Normal W-037_4 13.18 31.1
## 5 2021-05-08 2023-11-09 04:24:00 Normal W-037_5 7.48 31.1
## msmt_RH_percent ID_len individual_ID replicate_no
## 1 13.4 7 W-037 1
## 2 14.7 7 W-037 2
## 3 12.9 7 W-037 3
## 4 12.8 7 W-037 4
## 5 12.8 7 W-037 5
Remove Extreme Values
evs_omitted <- outliers_omitted %>%
dplyr::filter(!(individual_ID == "F-02" & CEWL_g_m2h == 26.69)) %>%
dplyr::filter(!(individual_ID == "F-05" & CEWL_g_m2h == 65.31)) %>%
dplyr::filter(!(individual_ID == "F-06" & CEWL_g_m2h == 28.92)) %>%
dplyr::filter(!(individual_ID == "F-14" & CEWL_g_m2h == 26.25)) %>%
dplyr::filter(!(individual_ID == "M-10" & CEWL_g_m2h == 30.79)) %>%
dplyr::filter(!(individual_ID == "W-013" & CEWL_g_m2h == 23.36)) %>%
dplyr::filter(!(individual_ID == "W-017" & CEWL_g_m2h == 24.31)) %>%
dplyr::filter(!(individual_ID == "W-026" & CEWL_g_m2h == 42.56)) %>%
dplyr::filter(!(individual_ID == "W-031" & CEWL_g_m2h == -1.32)) %>%
dplyr::filter(!(individual_ID == "W-037" & CEWL_g_m2h == 7.48))
nrow(outliers_omitted) == nrow(evs_omitted) + 10## [1] TRUE
Re-Assess Variation
new_new_CVs <- evs_omitted %>%
group_by(individual_ID, date) %>%
summarise(mean = mean(CEWL_g_m2h),
SD = sd(CEWL_g_m2h),
CV = (SD/mean) *100,
min = min(CEWL_g_m2h),
max = max(CEWL_g_m2h),
range = max - min)## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## individual_ID date mean SD
## F-12 : 3 Min. :2021-04-23 Min. : 0.650 Min. :0.05508
## M-09 : 3 1st Qu.:2021-04-24 1st Qu.: 8.486 1st Qu.:1.11937
## M-10 : 3 Median :2021-04-24 Median :10.381 Median :1.80387
## M-11 : 3 Mean :2021-05-08 Mean :10.272 Mean :1.98105
## M-19 : 3 3rd Qu.:2021-05-08 3rd Qu.:12.881 3rd Qu.:2.59019
## M-20 : 3 Max. :2021-07-14 Max. :21.673 Max. :5.34626
## (Other):100
## CV min max range
## Min. : 1.032 Min. : 0.140 Min. : 1.120 Min. : 0.100
## 1st Qu.: 11.976 1st Qu.: 6.723 1st Qu.: 9.985 1st Qu.: 2.118
## Median : 17.945 Median : 8.525 Median :12.485 Median : 3.695
## Mean : 22.421 Mean : 8.362 Mean :12.408 Mean : 4.047
## 3rd Qu.: 29.047 3rd Qu.:10.527 3rd Qu.:14.865 3rd Qu.: 5.308
## Max. :105.713 Max. :19.640 Max. :24.960 Max. :12.560
##
Another big improvement. :)
Average Replicates (outliers removed)
CEWL_avgs <- evs_omitted %>%
group_by(date, individual_ID) %>%
summarise(CEWL_g_m2h_mean = mean(CEWL_g_m2h),
CEWL_SD = sd(CEWL_g_m2h),
CEWL_CV = (CEWL_SD/CEWL_g_m2h_mean)*100,
msmt_temp_C = mean(msmt_temp_C),
msmt_RH_percent = mean(msmt_RH_percent)) ## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
## [1] 22.42122
CEWL_final <- CEWL_avgs %>%
dplyr::select(date, individual_ID,
CEWL_g_m2h = CEWL_g_m2h_mean,
msmt_temp_C, msmt_RH_percent) %>%
# calculate VPD based on Campbell & Norman 1998
mutate(e_s_kPa = 0.611 * exp((17.502*msmt_temp_C)/(msmt_temp_C + 240.97)),
msmt_VPD_kPa = e_s_kPa*(1 - (msmt_RH_percent/100))
)
head(CEWL_final)## # A tibble: 6 × 7
## # Groups: date [1]
## date individual_ID CEWL_g_m2h msmt_temp_C msmt_RH_perc…¹ e_s_kPa msmt_…²
## <date> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2021-04-23 F-01 10.4 31.7 12.2 4.68 4.11
## 2 2021-04-23 F-02 15.4 33.6 16.8 5.19 4.32
## 3 2021-04-23 F-03 8.40 32.0 14.2 4.76 4.09
## 4 2021-04-23 F-04 9.21 25.0 26.0 3.17 2.35
## 5 2021-04-23 F-11 8.96 31.9 14.0 4.72 4.06
## 6 2021-04-23 F-12 1.49 32.2 13.5 4.82 4.17
## # … with abbreviated variable names ¹msmt_RH_percent, ²msmt_VPD_kPa
Final Synthesis
Re-Check Data
Check that we still have data for every individual.
I can check this by comparing original individual IDs to the individual IDs in our final dataset, then selecting/printing the IDs used that are in one but not the other.
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE
All is as expected. :)
Check how many observations were used to calculate mean CEWL for each individual on each date:
## `summarise()` has grouped output by 'individual_ID'. You can override using the
## `.groups` argument.
## # A tibble: 118 × 3
## # Groups: individual_ID [80]
## individual_ID date n
## <fct> <date> <int>
## 1 F-02 2021-04-23 2
## 2 F-05 2021-04-24 2
## 3 F-06 2021-04-24 2
## 4 F-14 2021-04-24 2
## 5 M-10 2021-04-24 2
## 6 W-013 2021-04-24 2
## 7 W-017 2021-04-24 2
## 8 W-026 2021-04-25 2
## 9 F-01 2021-04-23 3
## 10 F-03 2021-04-23 3
## # … with 108 more rows
Between 2-5.
Export
Save the cleaned data for models and figures.
Reporting
We omitted a total of 105 measurements from our CEWL dataset (465 - 351): 1 replicate was removed for most individuals. We used the boxplot.stats function in R to extract outliers from each set of technical replicates, and 24 points were removed this way (outliers_found dataframe). We removed an additional 10 extreme replicate values from rep groups with extremely high CEWL value ranges where the rest of the reps were very similar; this was always for rep sets of 3, so the outlier just couldn’t be detected statistically. After data cleaning, every individual still had 2-5 technical replicates for each of their measurement dates. The distribution of coefficient of variation values was much better after both data cleaning steps than before.